Date Tags python

For the IRIS-HEP organization we need to collect publications and update our webpage regularly. I also have to do this for our group webpage, my CV, etc. It's annoying to copy/paste all of this information. The website lets you export bibtex, latex, and plain text for individual papers or a set of papers that match a search result, but it's still not very convenient for web stuff where Markdown is common. The IRIS-HEP webpage is also based on jekyll, and can parse yaml files for making publication pages. So I wanted a tool that could ingest a bunch of paper identifiers and output yaml.

The new INSPIRE beta has a more modern API, so I wanted to try that out. Here's a repo with what I came up with while at CERN

recid_unpublished = 1726790 #notpublished
recid_published = 1705857 #published
list_of_recids = [recid_published, recid_unpublished]

which yields

- arxiv_eprint: '1811.12113'
  authors: Aaboud, Morad; Aad, Georges; Abbott, Brad; Abdinov, Ovsat; Abeloos, Baptiste;
    et. al.
  collaboration: ATLAS
  creation_date: '2018-11-30'
  doi: 10.1007/JHEP04(2019)046
  journal_title: JHEP
  journal_volume: '04'
  journal_year: 2019
  page_start: '046'
  recid: 1705857
  title: Measurements of fiducial and differential cross-sections of $t\bar{t}$ production
    with additional heavy-flavour jets in proton-proton collisions at $\sqrt{s}$ =
    13 TeV with the ATLAS detector
- arxiv_eprint: '1903.10563'
  authors: Carleo, Giuseppe; Cirac, Ignacio; Cranmer, Kyle; Daudet, Laurent; Schuld,
    Maria; et. al.
  creation_date: '2019-03-27'
  recid: 1726790
  title: Machine learning and the physical sciences

Here's an except from the notebook

Convert inspire ID's into short python dictionaries for Website

by Kyle Cranmer April 14, 2019

In [1]:
import requests
import json
In [2]:
#if you are running on Binder, you will need to uncomment the next line and execute it
#!pip install pyyaml 
In [3]:
import yaml
In [4]:
recid_unpublished = 1726790 #notpublished
recid_published = 1705857 #published
recid = recid_unpublished
url = ''+str(recid)
In [5]:
def summarize_record(recid):
    url = ''+str(recid)
    max_authors = 5
    r = requests.get(url)
    data = r.json()['metadata']
    mini_dict = {'recid':recid}
    if len(data['authors'])>max_authors:
        #mini_dict.update({'authors':[a['full_name'] for a in data['authors'][:max_authors]]+['et. al.']})
        mini_dict.update({'authors':"; ".join([a['full_name'] for a in data['authors'][:max_authors]]+['et. al.'])})
        mini_dict.update({'authors':[a['full_name'] for a in data['authors']]})

    if 'collaborations' in data:
        mini_dict.update({'collaboration': data['collaborations'][0]['value']})

    mini_dict.update({'arxiv_eprint': data['arxiv_eprints'][0]['value']})
    mini_dict.update({'url': ''+data['arxiv_eprints'][0]['value']})
    mini_dict.update({'creation_date': data['legacy_creation_date']})

    if 'publication_info' in data:
    if 'dois' in data:
        mini_dict.update({'doi': data['dois'][0]['value']})
    return mini_dict
In [6]:
def summarize_records(recids):
    return {'publications':[summarize_record(recid) for recid in recids]}

example summarizing 2 individual records

In [7]:
{'recid': 1705857,
 'title': 'Measurements of fiducial and differential cross-sections of $t\\bar{t}$ production with additional heavy-flavour jets in proton-proton collisions at $\\sqrt{s}$ = 13 TeV with the ATLAS detector',
 'authors': 'Aaboud, Morad; Aad, Georges; Abbott, Brad; Abdinov, Ovsat; Abeloos, Baptiste; et. al.',
 'collaboration': 'ATLAS',
 'arxiv_eprint': '1811.12113',
 'url': '',
 'creation_date': '2018-11-30',
 'journal_title': 'JHEP',
 'journal_volume': '04',
 'page_start': '046',
 'journal_year': 2019,
 'doi': '10.1007/JHEP04(2019)046'}


comments powered by Disqus